Homework 5 - Documenting an HR database
As a highly prized and chased database wizard, you’ve been hired at an exorbitant hourly rate asked to document an HR database of a fortune 1_000_000 company.
You’ve been given credentials to the database. (See course slides and/or lecture video!)
Your tasks are to submit a report that contains:
- A general written description of contents of database,
- A Chen diagram highlighting the major entities and relations,
- A Crows-foot diagram highlighting the entities, attributes, and the cardinality and participation of the relations,
- A python-generated listing of SQL tables and fields, and
- A listing of sample data showing the first bunch (5 or 10 or so) of EACH table.
- All this will be documented in the report and the HTML report will be uploaded to canvas.
Github repo: https://github.com/cmsc-vcu/cmsc408-fa2025-hw5-CaptnKristmas
Database design
The following sections document the general design of the HR database.
General description
This database houses the HR information for a company. It stores and organizes informaiton about: employees, jobs, departments, locations, and previous employment. Some of the things it tracks includes: salaries of emplyees, their previous work history, who works in what department, their positions and the locations of departments.
The HR database contains a total of 9 tables 1 of which being a View and the other 8 Base Tables. These include: countries, departments, emp_details_view, employees, job_grades, job_history, jobs, locations and regions. Each holds parts of the information and are connected via relationships showing how that information is connected in different ways. Through the following Chen and Crow’s Foot diagrams one will be able to see how these relationships work.
Chen diagram
The following diagram shows the relationships between tables in the database using a chen diagram.
Crows foot diagram
The following diagram shows the relationships between tables as a crows foot diagram.
erDiagram
REGIONS {
int region_id PK
varchar(25) region_name
}
COUNTRIES {
char(2) country_id PK
varchar(40) country_name
int region_id FK
}
LOCATIONS {
int location_id PK
varchar(40) street_address
varchar(12) postal_code
varchar(30) city
varchar(25) state_province
char(2) country_id FK
}
DEPARTMENTS {
int department_id PK
varchar(30) department_name
int manager_id FK
int location_id FK
}
JOBS {
varchar(10) job_id PK
varchar(35) job_title
float min_salary
float max_salary
}
EMPLOYEES {
int employee_id PK
varchar(20) first_name
varchar(25) last_name
varchar(25) email UK
varchar(20) phone_number
date hire_date
varchar(10) job_id FK
decimal salary
decimal commission_pct
int manager_id FK
int department_id FK
}
JOB_HISTORY {
int employee_id PK
date start_date PK
date end_date PK
varchar(10) job_id FK
int department_id FK
}
JOB_GRADES {
varchar(3) grade_level PK
float lowest_sal
float highest_sal
}
REGIONS ||--o{ COUNTRIES : "has"
COUNTRIES ||--o{ LOCATIONS : "has"
LOCATIONS ||--o{ DEPARTMENTS : "hosts"
DEPARTMENTS ||--o{ EMPLOYEES : "employs"
JOBS ||--o{ EMPLOYEES : "assigned"
EMPLOYEES ||--o{ JOB_HISTORY : "has"
JOBS ||--o{ JOB_HISTORY : "for_job"
DEPARTMENTS o|--o{ JOB_HISTORY : "worked_in"
EMPLOYEES ||--o{ EMPLOYEES : "manages"
JOB_GRADES ||--o{ EMPLOYEES : "grade"
Listing of tables in the database
The python blocks that follow show how to login and pull from a database in order to create an html that displays information from said database. The python code blocks are what is needed to be used in order to achieve such a task.
Create connection
This code is loading credentials from the environment file (.env) in order to log into the database to pull further information.
from itables import show
from helpers import create_database_engine, run_sql_and_return_df, run_sql_and_return_html, create_db_wrapper
# Load these variables from .env file.
config_map = {
'user' : "MYSQL_USERNAME",
'password': "MYSQL_PASSWORD",
'host' : "MYSQL_HOST",
'database': "HW5_DB"
}
cnx,config = create_db_wrapper( config_map )List of tables found in HR database
This querries the Information_schema in MySQL in order to return a list of the tables in the database.
df = run_sql_and_return_df(cnx,f"""
SELECT
TABLE_NAME, TABLE_TYPE
FROM
INFORMATION_SCHEMA.TABLES
WHERE
TABLE_SCHEMA = '{config['database']}'
ORDER BY
TABLE_NAME
""",cnx)
show(df)
cnx,config = create_db_wrapper( config_map ) | Loading ITables v2.5.2 from the internet... (need help?) |
Listing of Tables and sample data
The following shows what data is stored on the database along with explanations of the various tables. It should help one understand how the data relates to each other.
Table: countries
The countries table contains data such as the type, the type of key and the field. It contains data relating to countries and regions.
Schema: countries
# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE countries
""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Sample data: countries
df = run_sql_and_return_df(cnx,"""
SELECT * FROM countries LIMIT 10""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Table: departments
The departments table contains data such as the field, type, and type of key. It contains data relating to departments their locations and managers.
Schema: departments
# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE departments
""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Sample data: departments
df = run_sql_and_return_df(cnx,"""
SELECT * FROM departments LIMIT 10""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Table: emp_details_view
The employee details view table contains data such as the field and type. It contains data relating to employees, their departments, locations, managers and jobs.
Schema: emp_details_view
# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE emp_details_view
""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Sample data: emp_details_view
df = run_sql_and_return_df(cnx,"""
SELECT * FROM emp_details_view LIMIT 10""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Table: employees
The employees table contains data such as the field, type and key. It contains data relating to employees, their departments, locations, managers and jobs. This expands on the previous table by adding information such as employees emails and phone numbers along with their names, hire dates, salaries and commision percentage.
Schema: employees
# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE employees
""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Sample data: employees
df = run_sql_and_return_df(cnx,"""
SELECT * FROM employees LIMIT 10""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Table: job_grades
The job_grades table contains data such as the field, type and the key. This data relates to the highest and lowest salaries for a jobs grade.
Schema: job_grades
# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE job_grades
""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Sample data: job_grades
df = run_sql_and_return_df(cnx,"""
SELECT * FROM job_grades LIMIT 10""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Table: job_history
The job_history table contains data such as the field, type and the key. This data relates employees to their start and end dates at the company.
Schema: job_history
# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE job_history
""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Sample data: job_history
df = run_sql_and_return_df(cnx,"""
SELECT * FROM job_history LIMIT 10""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Table: jobs
The jobs table contains data such as field type and key. This data relates jobs to their max and minimum salaries.
Schema: jobs
# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE jobs
""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Sample data: jobs
df = run_sql_and_return_df(cnx,"""
SELECT * FROM jobs LIMIT 10""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Table: locations
The locations table contains data such as the field, type and key. This data relates the street addresses on file to their respective cities, postal codes, states/provinces and countries.
Schema: locations
# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE locations
""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Sample data: locations
df = run_sql_and_return_df(cnx,"""
SELECT * FROM locations LIMIT 10""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Table: regions
The regions table contains data such as field, type and key. This data relates to a list of regions.
Schema: regions
# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE regions
""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Sample data: regions
df = run_sql_and_return_df(cnx,"""
SELECT * FROM regions LIMIT 10""")
show(df)| Loading ITables v2.5.2 from the internet... (need help?) |
Reflection
- Please add a paragraph or two about how this project went.
- I discuss this further in the section asking about how the information provided was. I encountered major difficulties during set-up that took multiple full days to resolve. Unfortunately, this was by far the hardest set-up I have ever encountered. I once had issues setting up Windows Side Loader but that still was miles smoother than this experience.
I did not encounter any massive issues while working on the project itself once everything was downloaded beyond some syntax errors that took a while to correct. I also encountered a weird issue where I had to uninstall itables, the pip cache and jupyter. Once reinstalling the issue was resolved.
- Was it harder or easier than you expected?
- This ended up being easier then expected once everything was installed and working but up to that point it was incredibly difficult.
- Did the instructor provide too much information or not enough information?
- I encountered issues while following the directions and ended up having to create work around in order to complete the process. Due to these work arounds I had issues having the libraries pre working on this poetry shell. As such I had to download every library the project uses. Additionally, despite the base python 3.11.6 having Jupyter it did not extend to the poetry shell. I was unable to see an environment for the shell in the terminal and had to create an environement specifically in the folder of the repo. This likely caused the issues but unfortunately I was unable to do this the way the directions instructed.
One major recommendation I have is to double check previous Python installs. I somehow had multiple of the same python version installed and had to go through and clean that out. This process along with properly setting up shims and the bin folder for pyenv caused the issues that ultimately led to the poetry workaround.
Poetry required an uninstall which after was not working due to the system seeing it as installed despite it not appear anywhere. This led to workaround that caused poetry env info to show no environement. Ultimately ending up at the solution I mentioned.
- Do you have suggestions for how it can be improved?
- Better clarification for error solving and possibly a warning to check certain things prior to starting. In addition to the small uninstalls it asks you to make to double check.
README
Homework 5 — Documenting an HR Database
Author: Vladimir Paraschiv
Course: CMSC 408 – Database Systems
Semester: Fall 2025
Date: October 17, 2025
Repository
📘 Overview
This project documents a Human Resources (HR) database for a large enterprise system.
The assignment focuses on exploring an existing relational schema, understanding how each table contributes to the organization’s HR functions, and visualizing relationships between entities using Chen and Crow’s Foot notations.
The report integrates SQL-generated metadata, sample data, and ER diagrams, offering both a conceptual and physical view of the database.
The deliverable demonstrates proficiency in connecting to a live database, querying the SQL Database, and displaying formatted tables directly within a Quarto HTML report.
🧩 Tools and Technologies
This project uses:
- Quarto — for professional report generation combining text, SQL, and visuals
- Graphviz (DOT) — to produce a Chen-style ER diagram
- Mermaid — to produce a Crow’s Foot diagram with cardinalities
- Python (3.11.6) — for database connection and querying
- itables — for rendering interactive HTML tables
- PyMySQL / SQLAlchemy / dotenv — for MySQL access and environment-variable handling
- Poetry — to manage dependencies and virtual environments
- Pyenv — to create isolated Python environments
🧠 Database Summary
The HR database consists of nine tables (8 base tables + 1 view) that collectively describe a company’s workforce, structure, and geography:
- regions, countries, locations, departments, employees, jobs, job_history, job_grades, and emp_details_view
These tables capture hierarchical and referential relationships such as: - Region → Country → Location → Department → Employee
- Jobs and Job Grades defining employee salary ranges and titles
- Historical data for past positions and department changes
The diagrams and table listings reveal how foreign-key relationships enforce integrity across this structure.
🧪 Rendering Instructions
To render the Quarto report into a standalone HTML document, run in Virtual Studio Code terminal: cd ./reports and then: quarto render report.qmd
Alternatively, simply open the pre-rendered report.html file included in this repository.